Skip to content

Add per-operation upstream override for REST API operations#2012

Open
mehara-rothila wants to merge 18 commits into
wso2:mainfrom
mehara-rothila:feat/per-op-upstream-gateway
Open

Add per-operation upstream override for REST API operations#2012
mehara-rothila wants to merge 18 commits into
wso2:mainfrom
mehara-rothila:feat/per-op-upstream-gateway

Conversation

@mehara-rothila

@mehara-rothila mehara-rothila commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a per-operation upstream override on REST API operations: an operation can route its main and/or sandbox traffic to a different backend than the API-level upstream. Per-op targets are ref-only: each references a named entry in spec.upstreamDefinitions rather than carrying an inline URL. Operations without a per-op upstream fall back to the API-level upstream, exactly as before.

A per-op route reuses the referenced upstreamDefinition's cluster (one cluster per definition, so operations sharing a ref share it) rather than minting a cluster per operation. API-level main/sandbox cluster names are URL-stable (derived from a hash of the API's identity, not the URL). Both are URL-independent, so URL edits update Envoy endpoints in place instead of recreating clusters.

Why

Real-world APIs often need individual operations to reach different backend services. Today that needs path-based workarounds or splitting into multiple APIs. This lets the API definition declare per-operation routing directly, while backend URLs stay defined once in upstreamDefinitions and are referenced by name.

Design decisions

Ref-only at the operation level. operations[].upstream.main / .sandbox carry only a ref to a named upstreamDefinition, with no inline url and no per-target timeout/hostRewrite. Backends are declared once in spec.upstreamDefinitions; connect timeout lives on the definition. (API-level upstream still accepts either url or ref, unchanged.)

Per-op routes reuse the definition cluster. A per-op ref does not create a new cluster. It points the operation's route at the cluster the referenced upstreamDefinition already owns (upstream_<kind>_<apiID>_<defName>, built once per definition). N operations sharing a ref cost one cluster, not N, and the route inherits the definition's authoritative base path. cluster_header stays on with that cluster as the default, so a dynamic-endpoint policy can still override the operation (precedence: op-policy > api-policy > per-op ref > api-level upstream).

URL-stable API-level cluster naming. API-level main/sandbox cluster names derive from the first 12 bytes of sha256(apiID) (24 hex), via clusterkey.APILevelName(env, apiID), producing main_<fragment> / sandbox_<fragment>. The env is a plaintext prefix, not part of the hash, so main and sandbox share the same fragment. The URL is intentionally excluded, so a URL edit keeps the same cluster name and Envoy updates its endpoints in place (no connection drain) rather than recreating the cluster. The clusters are STRICT_DNS with an inline load assignment. Rollout: a one-time cluster rebuild on the first config push after upgrade (URL-derived to identity-based names), handled gracefully by Envoy's CDS warming; stable thereafter.

Single source of truth. Ref resolution and connect-timeout parsing go through pkg/utils/upstreamref (FindByName, ParseConnectTimeout); API-level cluster-name hashing goes through pkg/utils/clusterkey (APILevel returns the identity fragment, APILevelName joins the env prefix). Both are consumed by the validator, the RDC transformer, and the xDS translator so they cannot drift.

What changed (gateway-controller layer)

  • OpenAPI schema (management-openapi.yaml): optional upstream on Operation resolving to RestAPIOperationUpstream (main/sandbox), each a ref-only RestAPIOperationUpstreamTarget. Wrapper and leaf both locked with additionalProperties: false.
  • Validator (api_validator.go): rejects an empty upstream: {} wrapper; requires at least one of main/sandbox; rejects a ref that does not resolve; and enforces one name contract (^[a-zA-Z0-9\-_]+$, max 100 chars) on both per-op refs and upstreamDefinition names so every valid definition name is referenceable.
  • Transform (transform/restapi.go) and xDS translator (xds/translator.go): per-op routes reuse the referenced definition's cluster (upstream_<kind>_<apiID>_<defName>, built once per definition); API-level clusters use URL-stable names main_<fragment>/sandbox_<fragment>; operations without a per-op upstream fall back to the API-level cluster.
  • Shared utils (pkg/utils/upstreamref, pkg/utils/clusterkey): new leaf packages. upstreamref handles ref resolution and connect-timeout parsing; the stdlib-only clusterkey handles URL-stable API-level cluster-key hashing (APILevel / APILevelName).

Test coverage

  • Unit tests across pkg/config, pkg/transform, pkg/xds, pkg/policy, pkg/utils/upstreamref, and pkg/utils/clusterkey: deterministic hash tests (APILevel), validator rejection tests, URL-stable cluster-name contract tests (same name across a URL edit), and definition-cluster reuse tests (operations sharing a ref share one cluster).
  • Integration tests (godog), 27 scenarios across four feature files (per-op-upstream-basic (9), per-op-upstream-ref (6), per-op-upstream-validation (6), api-level-url-stable (6)) covering API-level fallback, per-op main/sandbox ref overrides, validation rejections, dynamic-endpoint precedence over per-op refs, and URL-stable in-place endpoint updates. All passing against a local docker-compose stack.

Example

apiVersion: gateway.api-platform.wso2.com/v1alpha1
kind: RestApi
metadata:
  name: shop-api
spec:
  displayName: Shop API
  version: v1
  context: /shop/$version
  vhosts:
    main: shop.example.com
  upstreamDefinitions:
    - name: user-service
      upstreams:
        - url: http://user-service:8080
    - name: order-service
      upstreams:
        - url: http://order-service:8080
  upstream:
    main:
      url: http://default-backend:8080
  operations:
    - method: GET
      path: /products
      # no per-op upstream -> uses default-backend
    - method: GET
      path: /users/{id}
      upstream:
        main:
          ref: user-service      # ref-only: points at an upstreamDefinition
    - method: POST
      path: /orders
      upstream:
        main:
          ref: order-service

@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds per-operation upstream overrides for REST operations: schema and generated models (ref-only per-op main/sandbox), validator checks (pattern, max length, existence, and timeout parsing), shared helpers (upstreamref lookup and timeout parsing, clusterkey API-level naming), transformer and XDS wiring to reuse upstream-definition clusters with basePath/timeout inheritance, RDC/xDS cluster naming stability, and extensive unit, integration, and BDD tests.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ManagementAPI
  participant Transformer
  participant XdsTranslator
  participant UpstreamDefinitions
  participant Envoy

  Client->>ManagementAPI: deploy RestApi (may include per-op upstream.ref)
  ManagementAPI->>Transformer: StoredConfig -> RuntimeDeployConfig (include per-op upstream refs)
  Transformer->>UpstreamDefinitions: resolve per-op ref via upstreamref.FindByName
  Transformer->>XdsTranslator: emit routes referencing definition cluster keys
  XdsTranslator->>Envoy: create/register clusters (API-scoped clusterkey.APILevel / reused definition clusters)
  Envoy-->>Client: traffic routed to correct backend (per-op or API-level)
Loading
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main feature: adding per-operation upstream override capability for REST API operations, which aligns with the core objective.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering purpose, goals, approach, design decisions, implementation details, test coverage, and a concrete example.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant

CLAassistant commented May 22, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@gateway/gateway-controller/api/management-openapi.yaml`:
- Around line 4139-4148: OperationUpstream currently allows an empty object
which is invalid; update the OperationUpstream schema (the OperationUpstream
object that defines properties main and sandbox) to require at least one of
those properties by adding an anyOf clause such as anyOf: - required: [main] -
required: [sandbox] (or an equivalent oneOf/anyOf expression) so the schema
enforces presence of main or sandbox and prevents {} being valid.

In `@gateway/gateway-controller/pkg/policy/builder.go`:
- Around line 185-186: The current condition treats any non-nil
op.Upstream.Sandbox as active; change it to perform a content-based check like
apiSandboxHasContent so empty sandbox objects are ignored. Introduce/compute a
per-op predicate (e.g., perOpSandboxHasContent) that mirrors the api-level
sandbox content test and use if apiSandboxHasContent || (op.Upstream != nil &&
perOpSandboxHasContent) to decide whether to append effectiveSandboxVHost.

In `@gateway/gateway-controller/pkg/xds/translator_test.go`:
- Around line 2061-2074: Update TestResolvePerOpUpstream_DedupSameURL to ensure
URL-stability by creating two distinct api.Upstream instances with different Url
values (e.g., "http://shared-svc:8080" and "http://shared-svc:8081") but the
same apiID, method, path and env, then call translator.resolvePerOpUpstream for
both and assert the returned cluster names are equal; keep the test name and use
the Translator.resolvePerOpUpstream and translator variable references so the
test still targets the same logic and verifies that cluster key derivation does
not depend on the exact URL string.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4012d4ee-58df-4773-b82e-fa11e65290d8

📥 Commits

Reviewing files that changed from the base of the PR and between 943c653 and b2426a7.

📒 Files selected for processing (16)
  • gateway/gateway-controller/api/management-openapi.yaml
  • gateway/gateway-controller/pkg/api/management/generated.go
  • gateway/gateway-controller/pkg/config/api_validator.go
  • gateway/gateway-controller/pkg/config/validator_test.go
  • gateway/gateway-controller/pkg/models/runtime_deploy_config.go
  • gateway/gateway-controller/pkg/policy/builder.go
  • gateway/gateway-controller/pkg/policy/builder_test.go
  • gateway/gateway-controller/pkg/transform/restapi.go
  • gateway/gateway-controller/pkg/transform/restapi_test.go
  • gateway/gateway-controller/pkg/utils/commonutils.go
  • gateway/gateway-controller/pkg/utils/commonutils_test.go
  • gateway/gateway-controller/pkg/xds/translator.go
  • gateway/gateway-controller/pkg/xds/translator_test.go
  • gateway/it/features/per-op-upstream-basic.feature
  • gateway/it/features/per-op-upstream-ref.feature
  • gateway/it/features/per-op-upstream-validation.feature

Comment thread gateway/gateway-controller/api/management-openapi.yaml Outdated
Comment thread gateway/gateway-controller/pkg/policy/builder.go Outdated
Comment thread gateway/gateway-controller/pkg/xds/translator_test.go Outdated
mehara-rothila pushed a commit to mehara-rothila/api-platform that referenced this pull request May 22, 2026
- OperationUpstream schema now requires at least one of main or sandbox
  via anyOf, rejecting empty {} at the schema layer as defense in depth
  alongside the validator.
- Policy builder per-op sandbox check is now content-based (trims and
  checks Url/Ref non-empty) matching the apiSandboxHasContent pattern
  used elsewhere. Fixes asymmetric pointer-only check that would emit
  sandbox policy chains for empty sandbox wrappers.
- TestResolvePerOpUpstream_DedupSameURL now uses two distinct URLs with
  identical apiID|METHOD|path|env, asserting cluster name independence
  from URL. Pins the EDS-stable design contract.
@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
gateway/gateway-controller/pkg/policyxds/policyxds_test.go (1)

120-120: 💤 Low value

Consider using a hash-like cluster key suffix for test realism.

The cluster key "main_fixture" has the correct main_ prefix for an API-level main environment cluster, but the suffix fixture doesn't reflect the production naming pattern introduced in this PR. According to the PR objectives, API-level clusters follow the pattern <env>_<sha256(apiID|env)[:8]> (e.g., main_a1b2c3d4). While the current fixture name is functionally correct and clear, using a hash-like suffix such as "main_12ab34cd" would make the test more representative of production behavior.

📝 Proposed update for more realistic fixture naming
 					Upstream: models.RouteUpstream{
-						ClusterKey: "main_fixture",
+						ClusterKey: "main_12ab34cd",
 					},
 				UpstreamClusters: map[string]*models.UpstreamCluster{
-					"main_fixture": {
+					"main_12ab34cd": {
 						BasePath:  "/",

Also applies to: 132-132

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@gateway/gateway-controller/pkg/policyxds/policyxds_test.go` at line 120,
Update the test fixture cluster key strings that currently use "main_fixture" to
a realistic hash-like suffix such as "main_12ab34cd" to match the production
pattern; locate the occurrences in the policyxds_test setup where ClusterKey:
"main_fixture" is declared (the test fixture variables/constants used in Policy
XDS tests) and replace them with "main_12ab34cd" (or another 8-char hex-like
hash) to make the test names reflect the <env>_<sha256(... )[:8]> convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@gateway/gateway-controller/pkg/policyxds/policyxds_test.go`:
- Line 120: Update the test fixture cluster key strings that currently use
"main_fixture" to a realistic hash-like suffix such as "main_12ab34cd" to match
the production pattern; locate the occurrences in the policyxds_test setup where
ClusterKey: "main_fixture" is declared (the test fixture variables/constants
used in Policy XDS tests) and replace them with "main_12ab34cd" (or another
8-char hex-like hash) to make the test names reflect the <env>_<sha256(...
)[:8]> convention.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8729c50c-d6aa-40f4-b0b0-efec0f9c903c

📥 Commits

Reviewing files that changed from the base of the PR and between c242faa and ab55925.

📒 Files selected for processing (13)
  • gateway/gateway-controller/cmd/controller/main.go
  • gateway/gateway-controller/pkg/config/api_validator.go
  • gateway/gateway-controller/pkg/policyxds/policyxds_test.go
  • gateway/gateway-controller/pkg/transform/restapi.go
  • gateway/gateway-controller/pkg/transform/restapi_test.go
  • gateway/gateway-controller/pkg/utils/clusterkey/clusterkey.go
  • gateway/gateway-controller/pkg/utils/clusterkey/clusterkey_test.go
  • gateway/gateway-controller/pkg/utils/commonutils_test.go
  • gateway/gateway-controller/pkg/xds/translator.go
  • gateway/gateway-controller/pkg/xds/translator_test.go
  • gateway/it/features/api-level-eds-stable.feature
  • gateway/it/features/per-op-upstream-basic.feature
  • gateway/it/suite_test.go
💤 Files with no reviewable changes (1)
  • gateway/gateway-controller/pkg/utils/commonutils_test.go
✅ Files skipped from review due to trivial changes (2)
  • gateway/gateway-controller/cmd/controller/main.go
  • gateway/it/suite_test.go

@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
gateway/gateway-controller/pkg/config/validator_test.go (1)

773-796: ⚡ Quick win

Isolate invalid-timeout cases with subtests.

Using require inside the loop means the first failing case can stop execution before the next case is checked. Wrapping each badTimeout in t.Run(...) keeps both cases independently validated and easier to diagnose.

Proposed test refactor
- for _, badTimeout := range []string{"0s", "-5s"} {
+ for _, badTimeout := range []string{"0s", "-5s"} {
+   t.Run(badTimeout, func(t *testing.T) {
      connect := badTimeout
      definitions := &[]api.UpstreamDefinition{
        {
          Name: "my-upstream",
          Timeout: &api.UpstreamTimeout{
            Connect: &connect,
          },
          Upstreams: []struct {
            Url    string `json:"url" yaml:"url"`
            Weight *int   `json:"weight,omitempty" yaml:"weight,omitempty"`
          }{
            {
              Url: "http://backend:8080",
            },
          },
        },
      }

      errors := validator.validateUpstreamDefinitions(definitions)
      require.Len(t, errors, 1, "timeout %q must be rejected", badTimeout)
      assert.Equal(t, "spec.upstreamDefinitions[0].timeout.connect", errors[0].Field)
      assert.Contains(t, errors[0].Message, "must be a positive duration")
+   })
  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@gateway/gateway-controller/pkg/config/validator_test.go` around lines 773 -
796, The test loop in validator_test.go uses require inside a for-range which
can abort the whole loop on first failure; refactor by wrapping each badTimeout
iteration in a subtest using t.Run(fmt.Sprintf("timeout=%s", badTimeout), func(t
*testing.T) { ... }) and move the assertions (require.Len, assert.Equal,
assert.Contains) into that subtest, capturing the loop variable (e.g., tt :=
badTimeout) before using it to build the definitions passed to
validator.validateUpstreamDefinitions so each case runs and reports
independently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@gateway/gateway-controller/api/management-openapi.yaml`:
- Around line 4136-4137: The schemas that currently use a $ref with a sibling
description (referencing "`#/components/schemas/RestAPIOperationUpstream`") must
be changed so the description is not a sibling of $ref; instead wrap the
reference in an allOf and move the description onto the enclosing schema object.
Concretely, replace occurrences where a property has "$ref:
'`#/components/schemas/RestAPIOperationUpstream`'" alongside "description" by
creating an object with "description: <same text>" and "allOf: [{ $ref:
'`#/components/schemas/RestAPIOperationUpstream`' }]" (apply this change for every
place referencing RestAPIOperationUpstream in the file).

In `@gateway/gateway-controller/pkg/config/api_validator.go`:
- Around line 656-667: The per-operation upstream ref checks (using
upstreamRefRegex and length >100 in the validate path that returns
ValidationError for field/refName) are stricter than names allowed by
validateUpstreamDefinitions, causing valid definition names to become
unreferencable; to fix, apply the same validation rules to
spec.upstreamDefinitions[*].name inside validateUpstreamDefinitions (reuse
upstreamRefRegex and the 100-character limit and return a ValidationError with
the same message format when a definition name violates them) OR remove the
extra constraints from the per-operation check so both places use the same
contract; update validateUpstreamDefinitions to reference the same
upstreamRefRegex and error messages (Field and Message) so names and refs are
aligned.

---

Nitpick comments:
In `@gateway/gateway-controller/pkg/config/validator_test.go`:
- Around line 773-796: The test loop in validator_test.go uses require inside a
for-range which can abort the whole loop on first failure; refactor by wrapping
each badTimeout iteration in a subtest using t.Run(fmt.Sprintf("timeout=%s",
badTimeout), func(t *testing.T) { ... }) and move the assertions (require.Len,
assert.Equal, assert.Contains) into that subtest, capturing the loop variable
(e.g., tt := badTimeout) before using it to build the definitions passed to
validator.validateUpstreamDefinitions so each case runs and reports
independently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a0b21e0b-ca37-459a-b596-04dce4de1349

📥 Commits

Reviewing files that changed from the base of the PR and between ab55925 and 4d08e3a.

📒 Files selected for processing (18)
  • gateway/gateway-controller/api/management-openapi.yaml
  • gateway/gateway-controller/pkg/api/management/generated.go
  • gateway/gateway-controller/pkg/config/api_validator.go
  • gateway/gateway-controller/pkg/config/validator_test.go
  • gateway/gateway-controller/pkg/policy/builder.go
  • gateway/gateway-controller/pkg/policy/builder_test.go
  • gateway/gateway-controller/pkg/transform/restapi.go
  • gateway/gateway-controller/pkg/transform/restapi_test.go
  • gateway/gateway-controller/pkg/utils/upstreamref/upstreamref.go
  • gateway/gateway-controller/pkg/utils/upstreamref/upstreamref_test.go
  • gateway/gateway-controller/pkg/xds/translator.go
  • gateway/gateway-controller/pkg/xds/translator_test.go
  • gateway/gateway-controller/tests/integration/storage_test.go
  • gateway/gateway-runtime/policy-engine/internal/kernel/translator.go
  • gateway/gateway-runtime/policy-engine/internal/kernel/translator_test.go
  • gateway/it/features/per-op-upstream-basic.feature
  • gateway/it/features/per-op-upstream-ref.feature
  • gateway/it/features/per-op-upstream-validation.feature
🚧 Files skipped from review as they are similar to previous changes (4)
  • gateway/gateway-controller/pkg/policy/builder.go
  • gateway/gateway-controller/pkg/policy/builder_test.go
  • gateway/gateway-controller/pkg/transform/restapi.go
  • gateway/gateway-controller/pkg/xds/translator.go

Comment thread gateway/gateway-controller/api/management-openapi.yaml
Comment thread gateway/gateway-controller/pkg/config/api_validator.go
@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@renuka-fernando renuka-fernando changed the base branch from main to feature/operation-level-ep June 3, 2026 12:46
@mehara-rothila mehara-rothila force-pushed the feat/per-op-upstream-gateway branch 2 times, most recently from 4cfd850 to 4caca38 Compare June 4, 2026 00:16
@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@renuka-fernando renuka-fernando changed the base branch from feature/operation-level-ep to main June 4, 2026 00:31
@renuka-fernando renuka-fernando changed the base branch from main to feature/operation-level-ep June 4, 2026 00:32
@renuka-fernando renuka-fernando marked this pull request as ready for review June 4, 2026 00:32
Describe the current wiring (only policy xDS receives the transformer registry; main Envoy xDS translates RestAPI configs directly) instead of implying the xDS translator cannot use transformers.
@mehara-rothila mehara-rothila force-pushed the feat/per-op-upstream-gateway branch from 79cff8b to bc4fb65 Compare June 8, 2026 01:19
@mehara-rothila mehara-rothila changed the base branch from feature/operation-level-ep to main June 8, 2026 09:02
@mehara-rothila mehara-rothila changed the base branch from main to feature/operation-level-ep June 8, 2026 10:14
@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai I'd like your opinion on a design decision in the operation-level upstream schema.

Context (gateway-controller/api/management-openapi.yaml):

  • API-level Upstream allows oneOf: [url, ref], so a backend can be given as a raw URL or as a ref to a named upstreamDefinitions entry.
  • Operation-level RestAPIOperationUpstream has the same main and sandbox sub-fields as the API level, BUT each sub-field is a RestAPIOperationUpstreamTarget, which is required: [ref] (ref-only). A raw inline url is intentionally NOT allowed at the operation level.

Question: is restricting per-operation overrides to ref-only (no inline url) the right call, or should we mirror the API level and also allow url here?

Please weigh in on these specific angles by looking at the actual code:

  1. Envoy/EDS cluster stability: does ref-only (one named definition = one reusable, stable cluster) actually avoid cluster churn vs. allowing an ad-hoc cluster per inline URL? Trace how the xDS translator builds clusters from upstream-definition refs.
  2. Feature parity: upstreamDefinitions carry basePath, timeout, and weighted upstreams (load balancing). If we allowed a raw per-op url, would those settings be unreachable at the operation level?
  3. Validation / single source of truth: is validating all backend URLs once in spec.upstreamDefinitions and referencing by name simpler and less error-prone than re-validating inline URLs on every operation?
  4. Any downsides, edge cases, or inconsistencies you see in keeping operation-level ref-only while the API level stays url-or-ref?

Please confirm whether the ref-only restriction is sound, or flag concerns.

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Trace cluster building in xDS translator for per-op refs vs URL-based upstreams
rg -n "resolvePerOpDefinitionCluster\|resolveUpstreamCluster\|sanitizeEnvoyClusterName\|UpstreamDefinitionClusterPrefix\|perOpDefinitionClusterKey" --type go -A 5 -B 2

echo "---CLUSTERKEY---"
cat gateway/gateway-controller/pkg/utils/clusterkey/clusterkey.go 2>/dev/null || fd -i "clusterkey.go" --exec cat {}

echo "---UPSTREAMREF---"
cat gateway/gateway-controller/pkg/utils/upstreamref/upstreamref.go 2>/dev/null || fd -i "upstreamref.go" --exec cat {}

echo "---XDS_TRANSLATOR_CLUSTER_BUILD---"
rg -n "createCluster\|ConnectTimeout\|UpstreamDefinition\|basePath\|ClusterKey\|EnvoyClusterName" gateway/gateway-controller/pkg/xds/translator.go -A 3 -B 2

echo "---OPERATION_UPSTREAM_SCHEMA---"
rg -n "RestAPIOperationUpstream\|RestAPIOperationUpstreamTarget\|ref-only\|inline" gateway/gateway-controller/api/management-openapi.yaml -A 8 -B 2

echo "---UPSTREAM_DEF_FIELDS---"
rg -n "basePath\|BasePath\|timeout\|Timeout\|weighted\|Weighted\|upstreams\|Upstreams" gateway/gateway-controller/pkg/api/management/generated.go -B 2 -A 2 | head -120

Length of output: 5710


🏁 Script executed:

#!/bin/bash
# Find actual file paths
fd -i "translator.go" --type f | grep -v test
fd -i "management-openapi.yaml" --type f
fd -i "generated.go" --type f | grep management

echo "---SEARCH FOR KEY SYMBOLS---"
rg -rn "resolvePerOpDefinitionCluster\|UpstreamDefinitionClusterPrefix\|perOpDefinitionClusterKey" --type go -A 8 -B 2

echo "---UPSTREAM DEF STRUCT---"
rg -rn "type UpstreamDefinition\|BasePath\|basePath\|Weighted\|weighted" --type go -A 3 -B 1 | head -80

echo "---OPERATION UPSTREAM OPENAPI---"
rg -rn "RestAPIOperationUpstream\|RestAPIOperationUpstreamTarget\|ref-only" -A 8 -B 2 | head -120

echo "---CLUSTER BUILD IN TRANSLATOR---"
rg -rn "createCluster\|ConnectTimeout\|ClusterKey\|EnvoyClusterName\|UpstreamDefinition" --type go -A 4 -B 2 | grep -v "_test.go" | head -200

Length of output: 1329


🏁 Script executed:

#!/bin/bash
echo "---TRANSLATOR---"
cat gateway/gateway-controller/pkg/xds/translator.go

echo "---OPENAPI YAML OPERATION UPSTREAM---"
grep -n "RestAPIOperation\|Operation\|upstream\|ref\|basePath\|timeout\|weighted" gateway/gateway-controller/api/management-openapi.yaml | head -120

echo "---UPSTREAM DEFINITION STRUCT IN GENERATED---"
grep -n "UpstreamDefinition\|BasePath\|Weighted\|Timeout\|Connect\|UpstreamTimeout" gateway/gateway-controller/pkg/api/management/generated.go | head -80

echo "---RESTAPI TRANSFORMER KEY SECTIONS---"
grep -n "perOp\|per_op\|PerOp\|DefinitionCluster\|definitionCluster\|UpstreamDef\|upstreamDef\|clusterkey\|upstreamref" gateway/gateway-controller/pkg/transform/restapi.go | head -80

Length of output: 123989



@mehara-rothila — based on reviewing the actual translator and transformer code, the ref-only restriction is well-founded. Here is the analysis across each angle.


1. Envoy/EDS cluster stability

The cluster name for a per-op override is computed as:

// translator.go – resolvePerOpDefinitionCluster
clusterName := constants.UpstreamDefinitionClusterPrefix + kind + "_" + apiID + "_" + sanitizeUpstreamDefinitionName(definition.Name)

The name is derived from the definition name, never from the URL. The upstream-definitions loop creates this cluster unconditionally, so any number of operations sharing the same ref reuse a single pre-existing cluster — zero per-op cluster creation. Compare this with what would happen if inline URLs were allowed: either a new cluster would be minted per operation (O(N) cluster churn on every config push when N operations exist), or a stable name would need to be invented without a definition-name anchor, which would require an entirely separate naming and EDS-stability scheme. The current design avoids both problems.


2. Feature parity

UpstreamDefinition carries three capabilities that a raw inline URL cannot provide:

Feature upstreamDefinitions ref Inline URL
basePath ✅ Inherited by every referencing route ❌ Not applicable
timeout.connect ✅ Per-definition, propagated to cluster ❌ No per-op timeout field exists
Weighted upstreams (load balancing) ✅ Multiple endpoints ❌ Single endpoint only

If a raw URL were permitted at the operation level, those users would silently get a stripped-down backend configuration. The ref-only constraint ensures all per-op overrides have full parity with what the definition system provides.


3. Validation / single source of truth

Both the xDS translator (resolveUpstreamDefinitionupstreamref.FindByName) and the RDC transformer (perOpDefinitionClusterKeyupstreamref.FindByName) resolve refs through the same utility function:

// upstreamref.go
func FindByName(ref string, defs *[]api.UpstreamDefinition) (*api.UpstreamDefinition, error) { ... }

URL validation (scheme, host, parse errors) is performed once per definition entry, not per operation. If inline URLs were allowed at the operation level, URL parsing and scheme validation would need to be duplicated in validateRestData/validateOperations, and the xDS translator would need a new code branch that bypasses resolvePerOpDefinitionCluster. The current design keeps the validator's operation upstream check as a pure ref-existence lookup, which is significantly simpler and less error-prone.


4. Downsides and edge cases

The one genuine downside is boilerplate for trivial cases: if a user wants to route a single operation to a different backend, they must declare an upstreamDefinition entry even if it is referenced only once. This is a real usability tax for simple scenarios.

Beyond that, the main structural consideration is the asymmetry between API-level (url or ref) and operation-level (ref-only). This asymmetry is intentional and defensible: the API-level url shorthand exists as a convenience for the majority of APIs where a single backend needs no named definition. A per-operation upstream override is, by definition, a more advanced routing scenario — the kind of scenario where the cluster reuse, basePath inheritance, and timeout control that definitions provide are most valuable. Offering a raw URL shortcut at the operation level would add implementation complexity (new cluster-naming scheme, new validation path, duplicate logic) while delivering a worse-featured result for users.


Summary: The ref-only restriction is sound. It preserves EDS-stable cluster identity, delivers full feature parity (basePath, timeout, weighted load balancing), and eliminates a duplicate validation and cluster-naming code path. The only trade-off — mandatory definition declaration even for single-use refs — is reasonable given the benefits. If you find that single-use boilerplate becomes a friction point in practice, it could be addressed later by adding a convenience url shorthand that internally auto-generates a definition, but that would be a separate, additive change and is not needed now.

@mehara-rothila mehara-rothila marked this pull request as draft June 16, 2026 08:07
@mehara-rothila mehara-rothila changed the base branch from feature/operation-level-ep to main June 19, 2026 06:18
API-level main and sandbox clusters are now named "<env>_<fragment>",
where fragment is the first 12 bytes of sha256(apiID) in hex, via
clusterkey.APILevelName. Main and sandbox share the fragment and are
set apart by the env prefix. The name excludes the URL, so a URL edit
updates the same named cluster instead of renaming it, keeping routes
and cluster stats keys continuous.

Adds the API-level url-stable IT scenarios with their Envoy admin step
definitions, an LLM cluster-naming unit test, and updates the
controller data-model spec to match.
@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai review the per-operation upstream schema (RestAPIOperationUpstream + RestAPIOperationUpstreamTarget) in gateway/gateway-controller/api/management-openapi.yaml. Is it well-formed and consistent with how the Go validator and transform/xds consume it? Flag any changes, or confirm it's fine. Give your own independent read.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "===OPENAPI YAML - FULL OPERATION UPSTREAM SCHEMAS==="
grep -n "RestAPIOperationUpstream\|RestAPIOperationUpstreamTarget\|Operation:\|operations:" gateway/gateway-controller/api/management-openapi.yaml | head -60

echo ""
echo "===FULL SCHEMA BLOCKS==="
python3 - <<'EOF'
import re

with open("gateway/gateway-controller/api/management-openapi.yaml") as f:
    content = f.read()

# Extract RestAPIOperationUpstream and RestAPIOperationUpstreamTarget blocks
for name in ["RestAPIOperationUpstream:", "RestAPIOperationUpstreamTarget:", "Operation:"]:
    idx = content.find(f"    {name}")
    if idx == -1:
        idx = content.find(name)
    if idx != -1:
        print(f"\n===SCHEMA: {name}===")
        print(content[idx:idx+600])
        print("...")
EOF

echo ""
echo "===GENERATED.GO - RestAPIOperationUpstream types==="
grep -n "RestAPIOperationUpstream\|RestAPIOperationUpstreamTarget\|union json\|MarshalJSON\|UnmarshalJSON\|AsRestAPI\|FromRestAPI\|MergeRestAPI" gateway/gateway-controller/pkg/api/management/generated.go | head -80

echo ""
echo "===GENERATED.GO - struct bodies==="
python3 - <<'EOF'
with open("gateway/gateway-controller/pkg/api/management/generated.go") as f:
    content = f.read()

for name in ["RestAPIOperationUpstream", "RestAPIOperationUpstreamTarget", "RestAPIOperationUpstream0", "RestAPIOperationUpstream1"]:
    idx = content.find(f"type {name} struct")
    if idx == -1:
        idx = content.find(f"type {name} =")
    if idx != -1:
        print(f"\n===TYPE: {name}===")
        print(content[idx:idx+300])
EOF

echo ""
echo "===VALIDATOR - operation upstream validation==="
grep -n "op\.Upstream\|Upstream\.Main\|Upstream\.Sandbox\|upstreamRef\|OperationUpstream\|RestAPIOperationUpstream" gateway/gateway-controller/pkg/config/api_validator.go -A 5 -B 2 | head -160

echo ""
echo "===TRANSFORMER - per-op upstream consumption==="
grep -n "op\.Upstream\|Upstream\.Main\|Upstream\.Sandbox\|PerOp\|perOp\|perOpDefinitionClusterKey" gateway/gateway-controller/pkg/transform/restapi.go -A 5 -B 2 | head -160

echo ""
echo "===XDS TRANSLATOR - per-op upstream consumption==="
grep -n "op\.Upstream\|Upstream\.Main\|Upstream\.Sandbox\|resolvePerOpDefinitionCluster\|PerOp\|perOp" gateway/gateway-controller/pkg/xds/translator.go -A 5 -B 2 | head -160

Length of output: 23172



@mehara-rothila — here is the independent read.


Overall verdict: mostly well-formed and consistent, with one structural concern worth addressing.


What's correct and consistent

Item YAML Go validator Transformer / xDS
ref pattern ^[a-zA-Z0-9\-_]+$ Same regex compiled in NewAPIValidator N/A — validator gate only
ref maxLength 100 Explicit > 100 check
ref minLength minLength: 1 strings.TrimSpace + empty check
upstream on Operation Optional $ref op.Upstream != nil guard throughout
additionalProperties: false Both schemas Not needed in Go (struct-typed)
allOf + single $ref for main/sandbox Valid OAS 3.0 pattern for description+$ref Generates *RestAPIOperationUpstreamTarget correctly
"at least one of main/sandbox" anyOf: [required:[main], required:[sandbox]] up.Main == nil && up.Sandbox == nil check

Concern: anyOf generates problematic union boilerplate

The anyOf clause triggers oapi-codegen's union-type machinery. This produces:

// Generated dead code — both are just interface{}
type RestAPIOperationUpstream0 = interface{}
type RestAPIOperationUpstream1 = interface{}

// Plus As/From/Merge methods for each — also dead code

type RestAPIOperationUpstream struct {
    Main    *RestAPIOperationUpstreamTarget `json:"main,omitempty"`
    Sandbox *RestAPIOperationUpstreamTarget `json:"sandbox,omitempty"`
    union   json.RawMessage                 // ← populated only by UnmarshalJSON
}

func (t RestAPIOperationUpstream) MarshalJSON() ([]byte, error) {
    b, err := t.union.MarshalJSON()  // ← returns null if union is nil
    ...
}

The latent bug: MarshalJSON delegates entirely to the union json.RawMessage field. That field is populated only during UnmarshalJSON (i.e., when a client sends JSON to the API). Any code path that constructs a RestAPIOperationUpstream directly in Go — such as a GET response that loads config from the database and re-serializes it — will have union == nil and will serialize as null instead of {"main":{"ref":"foo"}}.

The transformer and xDS translator access op.Upstream.Main and op.Upstream.Sandbox directly and never call MarshalJSON, so those paths are unaffected. But the API response path (GET endpoint) will hit this if the stored config struct is serialized back to JSON.

The anyOf constraint itself is also redundant: the Go validator already enforces "at least one of main/sandbox" independently. The schema-level constraint does not add server-side protection beyond what the validator provides.

Recommended fix: replace the anyOf with minProperties: 1 if oapi-codegen supports it without generating union code, or simply remove the anyOf and rely on the validator + the existing description text for the constraint. Either approach eliminates the union boilerplate and the MarshalJSON null risk.

# Current — generates union machinery
RestAPIOperationUpstream:
  type: object
  additionalProperties: false
  anyOf:
    - required: [main]
    - required: [sandbox]
  properties: ...

# Suggested — no union generation; constraint documented and enforced by validator
RestAPIOperationUpstream:
  type: object
  additionalProperties: false
  minProperties: 1   # if generator respects this without union codegen
  description: "... At least one of main or sandbox must be set (enforced by server-side validation)."
  properties: ...

If minProperties: 1 also triggers union code in your generator version, removing the anyOf entirely is the safer option — the validator enforcement is the authoritative gate.


Summary

The schema accurately models the intended constraints, and all field-level rules (pattern, maxLength, minLength, additionalProperties) are faithfully mirrored in the validator. The one actionable item is the anyOf clause: it generates dead code and a MarshalJSON-null risk that could surface in the API GET response path. Everything the transformer and xDS translator consume is the struct fields directly, so those paths are sound regardless.

Comment on lines +4330 to +4331
allOf:
- $ref: "#/components/schemas/RestAPIOperationUpstreamTarget"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove allof

Comment thread gateway/gateway-controller/api/management-openapi.yaml
Apply the upstream-definition name contract (minLength 1, maxLength 100,
pattern ^[a-zA-Z0-9\-_]+$) to the API-level Upstream.ref so it validates
the same as UpstreamDefinition.name and the per-operation ref. Reference
RestAPIOperationUpstreamTarget directly from the per-operation main and
sandbox instead of wrapping it in allOf, and regenerate the management model.
Define the upstream-definition name contract once in a new UpstreamReference
schema (string, minLength 1, maxLength 100, pattern ^[a-zA-Z0-9\-_]+$) and
reference it from UpstreamDefinition.name, Upstream.ref, and the per-operation
RestAPIOperationUpstreamTarget.ref instead of repeating the constraints inline.
Regenerate the management model; oapi-codegen emits UpstreamReference as a
string alias, so consuming code is unchanged.
@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai Investigate the policy-building flow in the wso2/api-platform gateway-controller (Go) on the feat/per-op-upstream-gateway branch. I want an independent answer to whether DerivePolicyFromAPIConfig is actually used, and what the real policy path is.

Context , policy building appears to exist in two places:

  1. pkg/policy/builder.go , DerivePolicyFromAPIConfig(...) returns *models.StoredPolicyConfig (held in pkg/storage/policy_store.go). Its doc comment claims it is used by APIDeploymentService, APIServer handlers, and main.go startup, but it also carries a TODO: "APIServer handlers (REST API path) - TODO: Refactor this to use the implementation."
  2. pkg/transform/restapi.go , RestAPITransformer.Transform builds policy chains via buildPolicyChain(...) into rdc.PolicyChains, which pkg/policyxds consumes and pushes to the policy engine.

A grep across the whole gateway directory found NO callers of DerivePolicyFromAPIConfig (only its own definition in builder.go).

Please verify independently and report, with file:line evidence:

  1. Is DerivePolicyFromAPIConfig (builder.go) called anywhere in the codebase? If yes, where exactly? If not, is it dead/orphaned code? Also check whether anything ever writes to policy_store.
  2. What is the ACTUAL active path that builds per-route policy chains and feeds the policy engine for a deployed REST API (trace it from the deploy event to the policy engine)? Is it the restapi.go transform -> rdc.PolicyChains -> policyxds path, or builder.go, or both?
  3. The per-op feature modified the sandbox vhost-gating in builder.go (added perOpSandboxHasContent). Is that copy on the active path, or did it edit an unused function? Is the equivalent gating in restapi.go (and xds/translator.go) the one actually exercised at runtime?
  4. Bottom line: did the per-op work correctly wire per-op (incl. per-op sandbox) policy building through the active path, or is there a real gap (e.g., builder.go was supposed to be the path but is never called)? Are we missing anything, or is building it only in the transform path correct?

Give your own independent verdict.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "===1. CALLERS OF DerivePolicyFromAPIConfig==="
rg -rn "DerivePolicyFromAPIConfig" --type go

echo ""
echo "===2. POLICY_STORE WRITES==="
rg -rn "policy_store\|PolicyStore\|StoredPolicyConfig\|SetPolicy\|SavePolicy\|WritePolicy\|StorePolicy" --type go | grep -v "_test.go" | head -80

echo ""
echo "===3. FULL BUILDER.GO==="
cat gateway/gateway-controller/pkg/policy/builder.go

echo ""
echo "===4. FULL POLICY_STORE.GO==="
fd -i "policy_store.go" --exec cat {}

echo ""
echo "===5. DEPLOY EVENT → TRANSFORM CHAIN==="
rg -rn "RestAPITransformer\|Transform\(\|Transform\b\|\.Transform" --type go | grep -v "_test.go" | head -80

echo ""
echo "===6. POLICYXDS CONSUMPTION==="
fd -i "policyxds" --type d
rg -rn "PolicyChains\|policyxds\|PushPolicy\|BuildPolicy" --type go | grep -v "_test.go" | head -80

echo ""
echo "===7. MAIN.GO OR DEPLOYMENT SERVICE WIRING==="
fd -i "main.go" --type f | xargs grep -l "policy\|Policy\|transformer\|Transformer" 2>/dev/null
rg -rn "APIDeploymentService\|deploymentService\|DeployAPI\|HandleDeploy\|RestAPITransformer\|TransformerRegistry\|transformerRegistry" --type go | grep -v "_test.go" | head -80

echo ""
echo "===8. HOW SANDBOX GATING APPEARS IN BOTH BUILDER AND RESTAPI==="
grep -n "perOpSandboxHasContent\|apiSandboxHasContent\|hasSandbox\|SandboxHas" gateway/gateway-controller/pkg/policy/builder.go gateway/gateway-controller/pkg/transform/restapi.go 2>/dev/null

Length of output: 24393


🏁 Script executed:

#!/bin/bash
echo "===1. CALLERS OF 'n' FUNCTION (non-test) FROM POLICY PACKAGE==="
rg -rn "\bpolicybuilder\.n\b\|\bpolicy\.n\b\|DerivePolicyFromAPIConfig" --type go | grep -v "_test.go"

echo ""
echo "===2. CONTROLLER MAIN.GO FULL==="
cat gateway/gateway-controller/cmd/controller/main.go

echo ""
echo "===3. POLICYXDS PACKAGE CONTENTS==="
fd --type f --extension go "." gateway/gateway-controller/pkg/policyxds/ | sort | xargs -I{} sh -c 'echo "---FILE: {}---" && cat {}'

echo ""
echo "===4. RESTAPITRANSFORMER - WHERE IS IT INSTANTIATED AND CALLED==="
rg -rn "RestAPITransformer\|NewRestAPITransformer\|transformerRegistry\|TransformerRegistry\|\.Transform(" --type go | grep -v "_test.go"

echo ""
echo "===5. POLICYSTORE.SET CALLERS (production)==="
rg -rn "\.Set(" --type go | grep -i "policy\|store" | grep -v "_test.go"

echo ""
echo "===6. POLICYXDS MAIN ENTRYPOINT AND HOW IT GETS RDC==="
fd --type f --extension go "." gateway/gateway-controller/pkg/policyxds/ | head -5

echo ""
echo "===7. HOW STOREDPOLICYCONFIG IS PRODUCED AND CONSUMED==="
rg -rn "StoredPolicyConfig\|StoredPolicyConf" --type go | grep -v "_test.go" | head -40

echo ""
echo "===8. WHAT PRODUCES rdc.PolicyChains AND WHO CONSUMES==="
rg -rn "PolicyChains\|\.PolicyChains" --type go | grep -v "_test.go" | head -40

echo ""
echo "===9. APISERVER HANDLER PRODUCTION CODE==="
fd --type f --extension go "." gateway/gateway-controller/pkg/api/handlers/ | grep -v test | sort | head -20
rg -rn "DeployAPI\|HandleDeploy\|apiDeployment\|APIDeployment\|DeploymentService\|deploymentService\|policybuilder\|DerivePolicyFrom" --type go | grep -v "_test.go" | head -50

Length of output: 155400



@mehara-rothila — here is the independent verdict with file:line evidence from the actual code.


1. Is DerivePolicyFromAPIConfig called anywhere in production?

No. It is dead production code.

The function exported in the doc comment as DerivePolicyFromAPIConfig is actually named n (unexported, single-letter) in pkg/policy/builder.go:

// DerivePolicyFromAPIConfig derives a policy configuration from an API stored config.
// Handles both RestApi and WebSubApi kinds. This is a shared utility used by:
// - APIDeploymentService (WebSocket event path)
// - APIServer handlers (REST API path) - TODO: Refactor this to use the implementation
// - main.go startup (loading existing configs)
func n(cfg *models.StoredConfig, ...) *models.StoredPolicyConfig {

A search for every caller of n across the entire codebase returns only test files:

Caller File
policybuilder.n(...) pkg/api/handlers/policy_ordering_test.go
policybuilder.n(...) pkg/api/handlers/handlers_test.go
n(...) pkg/policy/builder_test.go
policybuilder.n(...) cmd/controller/main_test.go

Zero matches in any non-test file.

PolicyStore (pkg/storage/policy_store.go) also confirms this: it has a Set method, but no production code calls it. The doc comment claiming the function is used by "APIDeploymentService", "APIServer handlers", and "main.go startup" is factually incorrect and will mislead future maintainers.


2. What is the actual active policy-building path?

The active path, traced from cmd/controller/main.go, is:

main.go
  → transform.NewRestAPITransformer(...)             // line ~307
  → transform.NewRegistry(restTransformer, ...)      // line ~310
  → policyManager.SetTransformers(transformerRegistry)  // line ~312
  → loadRuntimeConfigsFromExistingAPIConfigurations(...)  // startup hydration
      → transformerRegistry.Transform(cfg)           // per each StoredConfig
  → [on deploy] policyManager.UpsertAPIConfig(cfg)   // pkg/policyxds/manager.go
      → pm.transformers.Transform(cfg)               // calls RestAPITransformer.Transform
          → buildPolicyChain(...)                    // pkg/transform/restapi.go
          → rdc.PolicyChains populated
      → runtimeStore.Set(key, rdc)
      → policySnapshotManager.UpdateSnapshot(ctx)
          → translator.TranslateRuntimeConfigs(rdcs)
          → policyCache.SetResources(...)            // pushed to policy engine via xDS

RestAPITransformer.Transform in pkg/transform/restapi.go is the sole production builder for rdc.PolicyChains. builder.go / n is never on this path.


3. Is the builder.go sandbox-gating change on the active path?

No. The perOpSandboxHasContent predicate added in builder.go (Line 185) modifies the function n, which has zero production callers. That change is on dead code.

The sandbox gating that is actually exercised at runtime lives in pkg/transform/restapi.go:

// restapi.go – active path
apiSandboxHasContent := ...  // Line 145
hasSandbox := apiSandboxHasContent
// per-op override loop sets hasSandbox = true when op.Upstream.Sandbox != nil  // Line 151-156
if apiSandboxHasContent || (op.Upstream != nil && op.Upstream.Sandbox != nil) {  // Line 229
    // append sandbox vhost route
}

This logic is called through policyManager.UpsertAPIConfig on every deploy. The xds/translator.go path for per-op sandbox is also on the active Envoy xDS path and is correctly gated.


4. Bottom line: is the per-op work correctly wired?

Yes for the active path; no for builder.go.

Location Active in production? Per-op wiring correct?
pkg/transform/restapi.go (→ rdc.PolicyChains) ✅ Yes ✅ Yes — op.Upstream.Main/.Sandbox handled
pkg/xds/translator.go ✅ Yes ✅ Yes — resolvePerOpDefinitionCluster wired
pkg/policy/builder.go (n) ❌ No — dead code N/A

There is no functional gap at runtime. The per-op sandbox override routes are built correctly through restapi.go and pushed to the policy engine via policyxds.

The actionable issue is builder.go itself. It carries a misleading doc comment claiming production usage, is tested as if it were a live component, and accumulates changes (including the perOpSandboxHasContent addition in this PR) that have no runtime effect. This creates ongoing maintenance risk: future per-op features added only to builder.go will silently not deploy, and the test coverage against it creates false confidence. The function should either be removed or the doc comment corrected to mark it explicitly as test-only.

Would you like to open a follow-up issue to track removal or correction of pkg/policy/builder.go (n) and pkg/storage/policy_store.go?

pkg/policy/builder.go (DerivePolicyFromAPIConfig) has no production callers,
so the per-op sandbox vhost gating added there had no runtime effect. The
active per-op policy path runs through transform/restapi.go and pkg/policyxds.
Revert builder.go and its test to base to keep the change on the active code.
The per-op upstream comments in the RestAPI transform and xDS translator
ran 5 to 7 lines each, well above the surrounding one to two line style.
Condense them to the essential invariants and move the longer rationale to
the PR description. No behavior change.
The per-op change renamed a TranslateRuntimeConfigs fixture cluster key
that no assertion depends on. Restore the original key so the per-op diff
leaves this test untouched.
- Reject compound and signed connect timeouts with a clear, discoverable
  error, and drop the leading-sign allowance so the validator matches the
  published OpenAPI pattern.
- Share per-op definition cluster naming through clusterkey.DefinitionName
  so both xDS paths derive the name identically; remove the two duplicate
  sanitizers.
- Drop the unused UpstreamCluster.ConnectTimeout field and its dormant
  reader; the data plane resolves timeouts via the legacy translator.
- Trim the upstream ref lookup symmetrically and correct the
  EnvoyClusterName doc comment.
@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

I've applied a round of fixes from an external review on this per-operation upstream PR. Please give your own independent assessment of whether the current state is correct. Verify each of these against the actual code and flag anything wrong, inconsistent, or newly broken, with file:line:

  • Connect-timeout validation: the regex is now ^\d+(\.\d+)?(ms|s|m|h)$ and the rejection message describes the single-unit, unsigned constraint. Is the validation correct and does it match the OpenAPI pattern?
  • Per-op definition cluster naming: extracted into clusterkey.DefinitionName, used at all controller sites, with the two old sanitizers removed. Does it produce the exact same cluster name as before (no 503 / NoRoute risk)?
  • Removed field: the unused UpstreamCluster.ConnectTimeout field and its reader were removed. Confirm nothing still depends on them.
  • Shared schema: the UpstreamReference schema used by UpstreamDefinition.name, Upstream.ref, and the per-op target ref.

Don't assume the changes are right; give a skeptical, independent read.

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "===1. CONNECT-TIMEOUT VALIDATION==="
echo "---validator.go timeout regex and error messages---"
rg -n "timeout\|connect\|regex\|Regexp\|ParseDuration\|ParseConnect\|^\\\d\|ms|s|m|h\|unit\|unsigned\|single" gateway/gateway-controller/pkg/config/api_validator.go | head -60

echo ""
echo "---upstreamref.go ParseConnectTimeout---"
fd -i "upstreamref.go" --type f | grep -v test | xargs cat

echo ""
echo "---OpenAPI yaml pattern for timeout---"
rg -n "connect\|timeout\|pattern\|format\|Timeout" gateway/gateway-controller/api/management-openapi.yaml | head -40

echo ""
echo "===2. clusterkey.DefinitionName==="
echo "---clusterkey package full contents---"
fd -i "clusterkey.go" --type f | grep -v test | xargs cat

echo ""
echo "---old sanitizer presence check---"
rg -rn "sanitizeEnvoyClusterName\|sanitizeUpstreamDefinitionName\|sanitizeClusterName" --type go | grep -v "_test.go"

echo ""
echo "---DefinitionName callers across codebase---"
rg -rn "DefinitionName\|clusterkey\.DefinitionName" --type go | grep -v "_test.go"

echo ""
echo "---old cluster naming in translator.go---"
rg -n "UpstreamDefinitionClusterPrefix\|upstream_\|kind.*apiID\|apiID.*kind\|definition.*Name\|Name.*definition\|sanitize" gateway/gateway-controller/pkg/xds/translator.go | head -30

echo ""
echo "---old cluster naming in restapi.go---"
rg -n "UpstreamDefinitionClusterPrefix\|upstream_\|perOpDefinitionClusterKey\|DefinitionName\|sanitize" gateway/gateway-controller/pkg/transform/restapi.go | head -30

echo ""
echo "===3. UpstreamCluster.ConnectTimeout removed==="
echo "---models runtime_deploy_config.go---"
fd -i "runtime_deploy_config.go" --type f | grep -v test | xargs cat

echo ""
echo "---any remaining ConnectTimeout references in non-test go files---"
rg -rn "ConnectTimeout\|connect_timeout" --type go | grep -v "_test.go"

echo ""
echo "===4. Shared UpstreamReference schema==="
echo "---OpenAPI yaml UpstreamReference schema---"
python3 - <<'EOF'
import re
try:
    with open("gateway/gateway-controller/api/management-openapi.yaml") as f:
        content = f.read()
    # Find UpstreamReference schema
    idx = content.find("UpstreamReference:")
    if idx != -1:
        print(content[idx:idx+400])
    else:
        print("UpstreamReference not found in yaml")
    # Find all usages of UpstreamReference
    for m in re.finditer(r'UpstreamReference', content):
        start = max(0, m.start()-100)
        end = min(len(content), m.end()+200)
        print(f"\n---USAGE at char {m.start()}---")
        print(content[start:end])
except FileNotFoundError:
    print("File not found")
EOF

echo ""
echo "---rg for UpstreamReference in yaml---"
rg -n "UpstreamReference" gateway/gateway-controller/api/management-openapi.yaml -A 5 -B 5

echo ""
echo "---validator usage of upstreamRefRegex and name validation---"
rg -n "upstreamRefRegex\|UpstreamReference\|nameRegex\|NameRegex\|RefPattern\|refPattern" --type go | grep -v "_test.go" | head -40

Length of output: 19523


🏁 Script executed:

#!/bin/bash
echo "===FULL VALIDATOR FILE - timeout and per-op sections==="
grep -n "timeout\|Timeout\|connect\|Connect\|ParseDuration\|regex\|Regex\|pattern\|Pattern\|upstreamRef\|UpstreamRef\|upstreamDefinitions\|UpstreamDefinition\|ref\|Ref" gateway/gateway-controller/pkg/config/api_validator.go | head -100

echo ""
echo "===FULL CLUSTERKEY FILE==="
cat gateway/gateway-controller/pkg/utils/clusterkey/clusterkey.go

echo ""
echo "===FULL RESTAPI.GO - cluster key usage==="
grep -n "clusterkey\|DefinitionName\|UpstreamDefinitionClusterPrefix\|constants\." gateway/gateway-controller/pkg/transform/restapi.go | head -40

echo ""
echo "===FULL TRANSLATOR.GO - cluster key usage==="
grep -n "clusterkey\|DefinitionName\|UpstreamDefinitionClusterPrefix\|constants\." gateway/gateway-controller/pkg/xds/translator.go | head -40

echo ""
echo "===CONSTANTS FILE==="
fd -i "constants.go" --type f | grep -i "gateway-controller" | xargs cat 2>/dev/null | head -60

echo ""
echo "===FULL RUNTIME DEPLOY CONFIG==="
cat gateway/gateway-controller/pkg/models/runtime_deploy_config.go

echo ""
echo "===POLICYXDS - ConnectTimeout usage==="
rg -rn "ConnectTimeout\|connect_timeout\|connectTimeout" --type go | grep -v "_test.go"

echo ""
echo "===OPENAPI YAML - full upstream and operation sections (500 lines around key areas)==="
python3 - <<'EOF'
try:
    with open("gateway/gateway-controller/api/management-openapi.yaml") as f:
        lines = f.readlines()
    for i, line in enumerate(lines):
        if any(kw in line for kw in ["UpstreamReference", "UpstreamDefinition", "RestAPIOperationUpstream", "Upstream:", "timeout", "connect", "pattern:", "UpstreamTimeout"]):
            start = max(0, i-2)
            end = min(len(lines), i+10)
            print(f"---Line {i+1}---")
            print("".join(lines[start:end]))
except FileNotFoundError:
    print("File not found")
EOF

Length of output: 48950



@mehara-rothila — independent read complete. Here is the assessment for each item.


1. Connect-timeout validation

Verdict: Correct and consistent with the OpenAPI pattern.

api_validator.go Line 55:

connectTimeoutRegex: regexp.MustCompile(`^\d+(\.\d+)?(ms|s|m|h)$`),

management-openapi.yaml Line 4261:

pattern: '^\d+(\.\d+)?(ms|s|m|h)$'

These are identical.

The validation order (Lines 380–397) is correct:

  1. time.ParseDuration fails → format error with the first message (covers abc, gibberish).
  2. ParseDuration succeeds but regex fails → second message naming the unit constraint (covers -5s, +5s, 5ns, 100us, 1h30m).
  3. Regex passes but d <= 0 → positivity error (covers 0s, 0.5ms is positive so passes, 0.0s is zero so caught).

One observation to be aware of: upstreamref.ParseConnectTimeout (upstreamref.go) does not apply the unit-contract regex — it uses only time.ParseDuration plus the positivity check. This means values like 5ns would pass ParseConnectTimeout at runtime if they somehow bypassed the validator. The validator is the correct enforcement point for this constraint, so this is not a bug in the current flow, but the asymmetry between the two parsers is worth a comment in ParseConnectTimeout to make the intent clear to future maintainers.


2. clusterkey.DefinitionName — cluster name equivalence

Verdict: Correct. Old sanitizers are fully removed; new function produces an equivalent name.

clusterkey.go:

func DefinitionName(kind, apiID, defName string) string {
    return constants.UpstreamDefinitionClusterPrefix + kind + "_" + apiID + "_" + sanitizeDefName(defName)
}
func sanitizeDefName(name string) string {
    name = strings.ReplaceAll(name, ".", "_")
    name = strings.ReplaceAll(name, ":", "_")
    return name
}

The old inline expression was UpstreamDefinitionClusterPrefix + kind + "_" + apiID + "_" + sanitizeUpstreamDefinitionName(definition.Name). The AI summary confirms the old sanitizer replaced dots and colons. The new sanitizeDefName does the same, so the output is identical.

Old sanitizer names (sanitizeEnvoyClusterName, sanitizeUpstreamDefinitionName, sanitizeClusterName) produce zero matches in production Go files — confirmed removed.

All four call sites (restapi.go Lines 266 and 452; translator.go Lines 946 and 1084) use the same argument order (kind, apiID, defName). No cluster-name divergence between paths. No 503/NoRoute risk.

One thing to note: the validator already rejects definition names with dots or colons via ^[a-zA-Z0-9\-_]+$, so for any newly created config the sanitization is a no-op. The replacement logic exists defensively for configs stored before the validator was tightened.


3. UpstreamCluster.ConnectTimeout removed

Verdict: Clean removal. Nothing depends on it.

runtime_deploy_config.goUpstreamCluster struct confirmed:

type UpstreamCluster struct {
    Name      string
    BasePath  string
    Endpoints []Endpoint
    TLS       *UpstreamTLS
}

No ConnectTimeout field present. A search for ConnectTimeout and connect_timeout in all non-test Go files returns no matches. The field is fully gone with no remaining dependents.

Connect timeout still reaches Envoy: it flows as an argument directly into createCluster in translator.go at xDS translation time, bypassing the RDC model entirely. RouteTimeout.Connect on the Route struct is a separate, unrelated concept (per-route timeout) and is not affected.


4. Shared UpstreamReference schema

Verdict: Well-formed and consistently applied.

UpstreamReference (yaml Line 4210):

type: string
minLength: 1
maxLength: 100
pattern: '^[a-zA-Z0-9\-_]+$'

Used in three places:

Schema Field Line
UpstreamDefinition name 4226
Upstream ref 4277
RestAPIOperationUpstreamTarget ref 4343

The validator's upstreamRefRegex (^[a-zA-Z0-9\-_]+$) is an exact match with the OpenAPI pattern. The max-100/non-empty checks in the validator align with maxLength: 100 / minLength: 1 in the schema.

Using UpstreamReference for both definition name and for ref values creates a useful structural invariant: any definition name that passes validation is guaranteed to be directly referenceable from a per-op upstream target without a separate character-set conversion. This is sound.


Previously noted concern still open

The anyOf union boilerplate in generated.go (RestAPIOperationUpstream0/1, MarshalJSON delegating to union json.RawMessage) was raised in the prior review. That was not part of this fix set — the anyOf is still in the schema (yaml Line 4327) and the union code is still generated. The MarshalJSON-null risk on the GET response path remains.

Cover the upstreamDefinition name contract (max length and allowed pattern) in
the validator, and add xDS-path tests for a per-op main override reusing the
referenced definition cluster and for the same-vhost rejection guard.
@mehara-rothila mehara-rothila marked this pull request as ready for review June 22, 2026 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants